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CURRENT STATUS DATA WITH COMPETING RISKS: 
LIMITING DISTRIBUTION OF THE MLE 
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University of Washington and University of Washington 

We study nonparametric estimation for current status data with 
competing risks. Our main interest is in the nonparametric maxi- 
mum likelihood estimator (MLE), and for comparison we also con- 
sider a simpler "naive estimator." Groeneboom, Maathuis and Well- 
ner [Ann. Statist. (2008) 36 1031-1063] proved that both types of 
estimators converge globally and locally at rate n^^'^ . We use these 
results to derive the local limiting distributions of the estimators. The 
limiting distribution of the naive estimator is given by the slopes of 
the convex minorants of correlated Brownian motion processes with 
parabolic drifts. The limiting distribution of the MLE involves a new 
self-induced limiting process. Finally, we present a simulation study 
showing that the MLE is superior to the naive estimator in terms of 
mean squared error, both for small sample sizes and asymptotically. 

1. Introduction. We study nonparametric estimation for current status 
data with competing risks. The set-up is as follows. We analyze a system 
that can fail from K competing risks, where IT G N is fixed. The random 
variables of interest are {X, Y), where X G M is the failure time of the system, 
and Y £ {1, . . . , K} is the corresponding failure cause. We cannot observe 
{X,Y) directly. Rather, we observe the "current status" of the system at a 
single random observation time T € M, where T is independent of (X,Y). 
This means that at time T, we observe whether or not failure occurred, 
and if and only if failure occurred, we also observe the failure cause Y. Such 
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data arise naturally in cross-sectional studies with several failure causes, and 
generalizations arise in HIV vaccine clinical trials (see [10]). 

We study nonparametric estimation of the sub-distribution functions Fqi , 
. . . , Fqk, where Fofc('S) = P{X < s,Y = k), k = 1, . . . , K. Various estimators 
for this purpose were introduced in [10, 12], including the nonparametric 
maximum likelihood estimator (MLE), which is our primary focus. For com- 
parison we also consider the "naive estimator," an alternative to the MLE 
discussed in [12]. Characterizations, consistency and n^/^ rates of conver- 
gence of these estimators were established in Groeneboom, Maathuis and 
Wellner [8]. In the current paper we use these results to derive the local 
limiting distributions of the estimators. 

1.1. Notation. The following notation is used throughout. The observed 
data are denoted by (T, A), where T is the observation time and A = 
(Ai, . . . , Ak+i) is an indicator vector defined by A^ = 1{X <T,Y = k} for 
k = 1, . . . , K , and Ak+i = ^{X > T}. Let (Tj, A'), i = 1, . . . ,n, be n i.i.d. 
observations of (T, A), where A* = (A|, . . . , A^^-^). Note that we use the 
superscript i as the index of an observation, and not as a power. The order 
statistics of Ti, . . . ,r„ are denoted by T(^i), ■ ■ ■ ,7"(n)- Furthermore, G is the 
distribution of T, Gn is the empirical distribution of Tj, i = 1, . . . , n, and P„ is 
the empirical distribution (Tj, A*), i = 1, . . . , n. For any vector (xi, . . . , xk) £ 
we define x+ = J2k=iXk, so that, for example, A+ = J2k=i^k and 
Fo+{s) = j:k=iFok{s).Fov any K-tuple F = {Fi, . . . , Fk) of sub-distribution 
functions, we define Fk+i{s) = /„^^(iF+(it) = F+(oo) — Fj^{s). 

We denote the right-continuous derivative of a function / : M i-^ M by /' 
(if it exists). For any function /:Mi— > R, we define the convex minorant of 
/ to be the largest convex function that is pointwise bounded by /. For any 
interval /, D{I) denotes the collection of cadlag functions on /. Finally, we 
use the following definition for integrals and indicator functions: 

Definition 1.1. Let dA be a Lebesgue-Stieltjes measure, and let W be 
a Brownian motion process. For t < to, we define l[fg^f){u) = — l[t^(p)(n), 



1.2. Assumptions. We prove the local limiting distributions of the esti- 
mators at a fixed point t^, under the following assumptions: (a) The obser- 
vation time T is independent of the variables of interest {X, Y) . (b) For each 
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k = I, . . . , K , < -Fofc(io) < -^OA;(c>o), and Fq^ and G are continuously differ- 
entiable at to with positive derivatives /oa:(*o) and g{to). (c) The system 
cannot fail from two or more causes at the same time. Assumptions (a) and 
(b) are essential for the development of the theory. Assumption (c) ensures 
that the failure cause is well defined. This assumption is always satisfied by 
defining simultaneous failure from several causes as a new failure cause. 

1.3. The estimators. We first consider the MLE. The MLE F„ = {Fni, . . . , 
Fuk) is defined by /„(F„) = maxi^g^^, ln{F), where 

(1) ln{F)= y'|^^41ogFfc(t) + (l-5+)log(l-F+(0)|dPn(t,<5), 

and J-K is the collection of i^-tuples F = {Fi, . . . ,Fk) of sub-distribution 
functions on R with F^ < 1. The naive estimator Fn = (Fni, ■ ■ ■ ,FnK) is 
defined by lnk{Fnk) = maxi7'^,gjr/„fc(Ffc), for k = l,...,K, where is the 
collection of distribution functions on M, and 

(2) lnk{Fk)= J{SklogFk{t) + {l-6k)log{l-Fi,{t))}clF^{t,6), 

k = l,...,K. 

Note that F^k only uses the kth entry of the A-vector, and is simply the 
MLE for the reduced current status data (T, A^). Thus, the naive estimator 
splits the optimization problem into K separate well-known problems. The 
MLE, on the other hand, estimates Fqi, . . . ,Fqk simultaneously, accounting 
for the fact that J2k=iFokis) = P{X < s) is the overall failure time distri- 
bution. This relation is incorporated both in the object function ln{F) [via 
the term log(l — and in the space J-k over which ln{F) is maximized 
(via the constraint < 1). 

1.4. Main results. The main results in this paper are the local limiting 
distributions of the MLE and the naive estimator. The limiting distribu- 
tion of Fnk corresponds to the limiting distribution of the MLE for the 
reduced current status data (T, A^). Thus, it is given by the slope of the 
convex minorant of a two-sided Brownian motion process plus parabolic 
drift ([9], Theorem 5.1, page 89) known as Chernoff's distribution. The 
joint limiting distribution of (F^i, . . . ,-F„,x) follows by noting that the K 
Brownian motion processes have a multinomial covariance structure, since 
A|T ~ Multx+i(l, {FoiiT), Fo^K+i{T))). The drifted Brownian motion 
processes and their convex minorants are specified in Definitions 1.2 and 
1.5. The limiting distribution of the naive estimator is given in Theorem 1.6, 
and is simply a ii'-dimensional version of the limiting distribution for current 
status data. A formal proof of this result can be found in [14], Section 6.1. 
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Definition 1.2. Let W = {Wi , Wk) be a i^-tuple of two-sided Brow- 
nian motion processes originating from zero, with mean zero and covariances 

(3) E{Wj{t)Wkis)} = i\s\A\t\)l{st>0}J:,k, s,teR,j,ke{l,...,K}, 

where S^-fc = g{tor^l{j = k}Fok{to)-Foj{to)Fok{to)}. Moreover, V = iVi,...^ 
Vk) is a vector of drifted Brownian motions, defined by 

(4) Vk{t) = Wk{t) + \hk{to)t\ k = l,...,K. 

Following the convention introduced in Section 1.1, we write W+ = Ef=i Wk 
and V+ = J2f=i ^k- Finally, we use the shorthand notation ak = (-PofcC^o))"^) 
k = l,...,K + 1. 

Remark 1.3. Note that W is the limit of a rescaled version of Wn = 
iWni, • • • , Wnx), and that V is the limit of a recentered and rescaled version 
of Vn = {Vni, . . . , Vuk), where Wnk and Vnk are defined by (17) and (6) of 
[8]: 

Wnkit) = [ {5k-Fokito)}dFr,iu,5), t€M,k = l,...,K, 

Ju<t 

Vnk{t)= f 6kd¥n{u,6), tem,k = l,...,K. 

Ju<t 

Remark 1.4. We define the correlation between Brownian motions Wj 
and Wk by 

_ ^jFojito)Fok{to) 



(5) 



Tjk 



V^n^kk ^il-Foj{to)){l-Fok{to)) 



Thus, the Brownian motions are negatively correlated, and this negative 
correlation becomes stronger as to increases. In particular, it follows that 
ri2 — > —1 as Fo+(to) — > li in the case oi K = 2 competing risks. 

Definition 1.5. Let H = [Hi, . . . , Hk) be the vector of convex mino- 
rants of V, that is, Hk is the convex minorant of Vk, for k = 1, . . . ,K . Let 
F = {Fi, . . . , Fk) be the vector of right derivatives of H. 

Theorem 1.6. Under the assumptions of Section 1.2, 

n^'^{Fn{to + n-^'H) - Fo(to)} F{t) 

in the Skorohod topology on{D{W))^ . 
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The limiting distribution of the MLE is given by the slopes of a new self- 
induced process H = {Hi, . . . , Hk), defined in Theorem 1.7. We say that the 
process H is "self-induced," since each component is defined in terms 
of the other components through = J2f=iHj- ^VLe to this self-induced 
nature, existence and uniqueness of H need to be formally established (The- 
orem 1.7). The limiting distribution of the MLE is given in Theorem 1.8. 
These results are proved in the remainder of the paper. 

Theorem 1.7. There exists an almost surely unique K-tuple H = {Hi, . . . , 
Hk) of convex functions with right- continuous derivatives F = {Fi, . . . , Fx), 
satisfying the following three conditions: 

(i) akHk{t) + aK+iH+{t) < afcVfc(t) + aK+iV+{t), for k = 1, . . . , K , t £ 

M. 

(ii) J{akHk{t) + aK+iH+{t) - a^t) - aK+iV+{t)} dFk{t) = 0, 
k = l,...,K. 

(iii) For all M > and k = 1, . . . , K , there are points Tik < —M and 
T2k > M so that akHk{t) + aK+iH+{t) = akVk{t) + aK+iV+{t) for t = Tik 
and t = T2k ■ 

Theorem 1.8. Under the assumptions of Section 1.2, 
n^^^{Fn{to + n" yh)-Fo{to)}^dF{t) 

in the Skorohod topology on{D{M.))^ . 

Thus, the limiting distributions of the MLE and the naive estimator are 
given by the slopes of the limiting processes H and H, respectively. In 
order to compare H and H, we note that the convex minorant Hk of Vk 
can be defined as the almost surely unique convex function Hk with right- 
continuous derivative Fk that satisfies: (i) Hk{t) < Vk{t) for all t G M, and 
(ii) J{Hk{t) — Vk{t)} dFk{t) = 0. Comparing this to the definition of Hk in 
Theorem 1.7, we see that the definition of Hk contains the extra terms H^ 
and V+, which come from the term log(l — F^{t)) in the log likelihood (1). 
The presence of H^ in Theorem 1.7 causes H to be self-induced. In contrast, 
the processes Hk for the naive estimator depend only on Vk, so that H is 
not self-induced. However, note that the processes Hi, . . . , Hk are correlated, 
since the Brownian motions Wi, . . . , Wk are correlated (see Definition 1.2). 

1.5. Outline. This paper is organized as follows. In Section 2 we discuss 
the new self-induced limiting processes H and F. We give various interpre- 
tations of these processes and prove the uniqueness part of Theorem 1.7. 
Section 3 establishes convergence of the MLE to its limiting distribution 
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(Theorem 1.8). Moreover, in this proof we automatically obtain existence 
of H, hence completing the proof of Theorem 1.7. This approach to prov- 
ing existence of the limiting processes is different from the one followed by 
[5, 6] for the estimation of convex functions, who establish existence and 
uniqueness of the limiting process before proving convergence. In Section 4 
we compare the estimators in a simulation study, and show that the MLE 
is superior to the naive estimator in terms of mean squared error, both for 
small sample sizes and asymptotically. We also discuss computation of the 
estimators in Section 4. Technical proofs are collected in Section 5. 

2. Limiting processes. We now discuss the new self-induced processes H 
and F in more detail. In Section 2.1 we give several interpretations of these 
processes, and illustrate them graphically. In Section 2.2 we prove tightness 
of {Ffc — fok(to)t} and {//^(t) — Vfc(t)}, for t G M. These results are used in 
Section 2.3 to prove almost sure uniqueness of H and F. 

2.1. Interpretations of H and F. Let k G {1, . . . , K^. Theorem 1.7(i) and 
the convexity of Hi, imply that a^Hi, + ax+iH^- is a convex function that 
lies below a^Vk + aK+iy+- Hence, OkHk + clk+iH+ is bounded above by the 
convex minorant of OkVk + ax+iV-^-. This observation leads directly to the 
following proposition about the points of touch between OkHk + ax+iH^ 
and akVk + aK+iV+: 

Proposition 2.1. For each k = 1, . . . ,K , we define A4 and Mk by 

(6) Mk = {points of touch between OkVk + aK+iV+ and its convex minorant}, 

(7) A/fc = {points of touch between OkVk + ax+iVj^ and akHk + ok-^iH^} 

Then the following properties hold: (i) Afk C A/fc, and (ii) At points t (zAfk, 
the right and left derivatives of akHk{t) + aK+iHj^[t) are bounded above and 
below by the right and left derivatives of the convex minorant of akVk{t) + 
aK+iV+{t). 

Since OkVk + aK+iy+ is a Brownian motion process plus parabolic drift, 
the point process Mk is well known from [4]. On the other hand, little is 
known about Mk, due to the self-induced nature of this process. However, 
Proposition 2.1(i) relates Mk to Mk, and this allows us to deduce proper- 
ties oi Mk and the associated processes Hk and Fk- In particular. Proposi- 
tion 2.1(i) implies that Fk is piecewise constant, and that Hk is piecewise 
linear (Corollary 2.2). Moreover, Proposition 2.1(i) is essential for the proof 
of Proposition 2.16, where it is used to establish expression (30). Proposi- 
tion 2.1(ii) is not used in the sequel. 
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Corollary 2.2. For each k G {1, . . . , K}, the following properties hold 
almost surely: (i) J\fk has no condensation points in a finite interval, and 
(ii) -Ffc is piecewise constant and Hj. is piecewise linear. 

Proof. A4 is a stationary point process which, with probabihty 1, has 
no condensation points in a finite interval (see [4]). Together with Propo- 
sition 2.1(i), this yields that with probability 1, A4 has no condensation 
points in a finite interval. Conditions (i) and (ii) of Theorem 1.7 imply that 
Ffc can only increase at points t £Mk- Hence, is piecewise constant and 
i/fc is piecewise linear. □ 

Thus, conditions (i) and (ii) of Theorem 1.7 imply that OkHk + ax^iH^ is 
a piecewise linear convex function, lying below a^Vk + ax+iV^ , and touching 
Q^fc^fc + o,K+iy+ whenever jumps. We illustrate these processes using the 
following example with K = 2 competing risks: 

Example 2.3. Let K = 2, and let T be independent of {X,Y). Let T, Y 
and X\Y be distributed as follows: G{t) = 1 — exp(— t), P(Y = k) = k/3 and 
P{X <t\Y = k) = l- exp{-kt) for A; = 1,2. This yields Fok{t) = (A;/3){1 - 
exp(—kt)} for k = l,2. 

Figure 1 shows the limiting processes a^Vfe + clk+iV^, a^Ffk + ax+iH^, 
and -Ffc, for this model with to = 1. The relevant parameters at the point 
to = 1 are 

Foi(l)=0.21, Fo2(l)=0.58, 

/oi(l)=0.12, /o2(l) = 0.18, 5(1) = 0.37. 

The processes shown in Figure 1 are approximations, obtained by comput- 
ing the MLE for sample size n = 100,000 (using the algorithm described in 
Section 4), and then computing the localized processes V^^'^ and H^^ (see 
Definition 3.1 ahead). 

Note that Fi has a jump around —3. This jump causes a change of slope in 
a^fffc + ax+iHj^ for both components A; G {1, 2}, but only for A; = 1 is there a 
touch between aj^Hk + aK+iHj^ and a^V^ + 07^+1!/+. Similarly, F2 has a jump 
around —1. Again, this causes a change of slope in OkH^ + ax+iH^ for both 
components k G {1,2}, but only for = 2 is there a touch between a^Hk + 
ok+iHj^ and akVk + The fact that OkHk + o-K+iH+ has changes of 

slope without touching CfcVfc + aK+iV+ implies that OkH^ + ax+iH^ is not 
the convex minorant of afcVfc + clk+iV^. 

It is possible to give convex minorant characterizations of H, but again 
these characterizations are self-induced. Proposition 2.4(a) characterizes 
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Fig. 1. Limiting processes for the model given in Example 2.3 for to — 1. The top row 
shows the processes atVk + aK+iV+ and atHk + aK+iH+ , around the dashed parabolic 
drifts akfok{to)t^ /2 + aK+ifo+{to)t'^ /2. The bottom row shows the slope processes Fk, 
around dashed lines with slope /ofc(io)- The circles and crosses indicate jump points of Fi 
and F2, respectively. Note that a^Hk + a.K+iH+ touches a^Vk + aK+iV+ whenever Fk has 
a jump, for k = 1,2. 



CURRENT STATUS COMPETING RISKS DATA (II) 9 
in terms of Y^f=iHj, and Proposition 2.4(b) characterizes in terms of 

Proposition 2.4. H satisfies the following convex minorant character- 
izations: 

(a) For each k = 1, . . . , K , Hk{t) is the convex minorant of 

(8) Vu{t) + '^{V+{t)-H+{t)]. 

(b) For each A; = 1, . . . ,K , H].{t) is the convex minorant of 

(9) V^{t) + {Vl-'\t) - Hi~'\t)}, 

where ^^["'^(t) = Ef=i,,Vfe ^^(0 and H^-''\t) =Y.f=i,^kHj{t). 
Proof. Conditions (i) and (ii) of Theorem 1.7 are equivalent to 
Hk{t) < Vk{t) + - H+{t)}, t G M, 

j \Hk{t) - Vk{t) - ^{V+{t) - dFk{t) = 0, 

for k = 1, . . . , K . This gives characterization (a). Similarly, characterization 
(b) holds since conditions (i) and (ii) of Theorem 1.7 are equivalent to 

m) < Vk{t) + {Vi-'\t) - Hi~''\t)}, t G M, 

[ {Skit) - Vk{t) - {Vi-'\t) - Hi~'\t)}] dFkit) = 0, 

J I ttk + OK+l } 

for A; = 1,...,K. □ 

Comparing the MLE and the naive estimator, we see that Hk is the convex 
minorant of Vfc, and Hk is the convex minorant of Vfc + (ax+i/afc){V+ — H^}. 
These processes are illustrated in Figure 2. The difference between the two 
estimators lies in the extra term (a;^+i/afc){V+ — H^}, which is shown in 
the bottom row of Figure 2. Apart from the factor ax+i/cLk, this term is 
the same for all k = 1, . . . , K . Furthermore, aK+i/cik = Fokito) / Fo,K+i{to) 
is an increasing function of to, so that the extra term {aK+i/ak){V^ — H+} 
is more important for large values of to- This provides an explanation for 
the simulation results shown in Figure 3 of Section 4, which indicate that 
the MLE is superior to the naive estimator in terms of mean squared error, 
especially for large values of t. Finally, note that {aK+i/ak){V^ — H+} ap- 
pears to be nonnegative in Figure 2. In Proposition 2.5 we prove that this 
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-ID -S 5 ID -1Q -S a 5 ID 

I t 

Fig. 2. Limiting processes for the model given in Example 2.3 for to — 1. The 
top row shows the processes 14 and their convex minorants Hk (gray), together with 
Vfc + {aK+i/cLk){V+ — H+) and their convex minorants Hk (black). The dashed lines depict 
the parabolic drift fok{to)t^ /2. The middle row shows the slope processes Fk (gray) and 
Fk (black), which follow the dashed lines with slope fok(to). The bottom row shows the 
"correction term" {aK+i/ak){V+ — H^) for the MLE. 
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is indeed the case. In turn, this result imphes that i/^ < Hj^ (Corollary 2.6), 
as shown in the top row of Figure 2. 

Proposition 2.5. H+{t) < V+{t) for all t e M. 

Proof. Theorem 1.7(i) can be written as 

J- - -^o+(,roj ■'---ro+lcoj 

for k = l,...,K,teR. 

The statement then follows by summing over k = I, . . . , K . □ 

Corollary 2.6. Hk{t) < Hk{t) for all k = l,...,K andteR. 

Proof. Let A; € {!,••• ,K} and recall that Hk is the convex minorant 
of Vfc. Since V+ — H+ > by Proposition 2.5, it follows that is a convex 
function below Vk + {aK+i/(ik){y+ — H^}. Hence, it is bounded above by 
the convex minorant Hk of Vk + {aK+i/o.k){V+ — H^}. □ 

Finally, we write the characterization of Theorem 1.7 in a way that is 
analogous to the characterization of the MLE in Proposition 4.8 of [8]. We 
do this to make a connection between the finite sample situation and the 
limiting situation. Using this connection, the proofs for the tightness results 
in Section 2.2 are similar to the proofs for the local rate of convergence in 
[8], Section 4.3. We need the following definition: 

Definition 2.7. For k = 1, . . . , K and t € M, we define 

(10) Fok{t) = fok{to)t and Sk{t) = akWk{t) + aK+iW+{t). 

Note that Sk is the limit of a rescaled version of the process Snk = CLkWnk + 
aK+iWn+, defined in (18) of [8]. 

Proposition 2.8. For all k = 1, . . . ,K , for each point Tk G Afk [defined 
in (7)] and for all s G R, we have 

(11) r{ak{Fk{u)-Fok{u)} + aK+i{F+{u)-Fo+{u)}}du< dSk{u), 
and equality must hold if s ^ Mk ■ 



Proof. Let /c G {1, . . .,K]. By Theorem 1.7(i), we have 

akHk{t) + aK+iH+{t) < akVk{t) + aK+iV+{t), t G M, 
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where equality holds at t = Tj^ G A/fc . Subtracting this expression for t = Tk 
from the expression for t = s, we get 

rs ^ ^ PS 

/ {akFk{u)+aK+iF+iu)}du< / {akdVk{u) + aK+idV+{u)}. 

Jti, Jtl, 



The result then follows by subtracting J^^{akFok{u) + aK+iFo+{u)} du from 



both sides, and using that dVk{u) = Fo^iu) du + dWk{u) [see (4)]. □ 

2.2. Tightness of H and F . The main results of this section are tightness 
of {Fk{t) - Fofc(t)} (Proposition 2.9) and {Hk{t) - Vfc(t)} (Corollary 2.15), 
for i £ M. These results are used in Section 2.3 to prove that H and F are 
almost surely unique. 

Proposition 2.9. For every e > there is an M > such that 
P{\Fk{t) -Fok{t)\>M)<e fork = l,...,K,t£R. 

Proposition 2.9 is the limit version of Theorem 4.17 of [8], which gave 
the n^/"^ local rate of convergence of Fnk- Hence, analogously to [8], proof of 
Theorem 4.17, we first prove a stronger tightness result for the sum process 
{F+{t)-Fo+{t)},teR. 



Proposition 2.10. Let /? e (0,1) and define 
(12) vit) -- 

Then for every e > there is an M > such that 



1, ^/ \t\ < 1, 

\tf, if \t\ > 1. 



pfsupMl-%M > M)<e forsG R. 

VtgR V{t-S) J 

Proof. The organization of this proof is similar to the proof of Theorem 
4.10 of [8]. Let e > 0. We only prove the result for s = 0, since the proof for 
s 7^ is equivalent, due to stationarity of the increments of Brownian motion. 

It is sufficient to show that we can choose M > such that 

P{3t£R:F+{t) i {Fo+{t - Mv{t)),Fo+it + Mv{t)))) 
= P{3t G M: |F+(t) - Fo+{t)\ > fo+{to)Mv{t)) < e. 
In fact, we only prove that there is an M such that 

P(3tG [0,oo):F+(t) >Fo+(t + Mt;(t))) < |, 
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since the proofs for the inequahty F^{t) < -Po+(t — Mv{t)) and the interval 
(—00, 0] are analogous. In turn, it is sufficient to show that there is an nii > 
such that 

(13) P{3te[j,j + l):F+{t)>Fo+{t + Mv{t)))<pjM, jeN,M>mi, 
where pjM satisfies J2'jLoPjM — > as M — > 00. We prove (13) for 

(14) PjM = di eM-d2{Mv{j)f}, 

where di and d2 are positive constants. Using the monotonicity of -F+, we 
only need to show that P{AjM) < PjM for all j € N and M > mi, where 

(15) AjM = {F+ij + '^)>Fo+{sjM)} and SjM=j + Mv{j). 

We now fix M > and j E N, and define r^j = maxjA/fc H (— 00, j + 1]}, for 
k = 1, . . . , K . These points are well defined by Theorem 1.7(iii) and Corol- 
lary 2.2(i). Without loss of generality, we assume that the sub-distribution 
functions are labeled so that Tij < ■ ■ ■ < r^j- On the event AjM, there 
is a A; G {1, . . . ,K} such that Ffc(j -1- 1) > ^ofc(sjAf). Hence, we can define 
iG{l,...,K} such that 

(16) Fk{j + l)<Fok{sjM), k = i + l,...,K, 

(17) F,{j + l)>Foi{sjM). 

Recall that F must satisfy (11). Hence, P{AjM) equals 

P[ / {ae{Fe{u) - Foeiu)} + aK+i{F+{u) - Fo+iu)}} du 

< / dSi{u),AjM 
Jtij 

(18) <P / ae{Fe{u) - Foeiu)} du < dSe{u),AjM 

(19) +P( {F+{u)-Fo+{u)}du<0,AjM 

Using the definition of Tij and the fact that F^ is monotone nondecreasing 
and piecewise constant (Corollary 2.2), it follows that on the event AjM 
we have Fi{u) > Fiij^j) = F^i^j -|- 1) > -fo^(sjAf)> for u > T£j. Hence, we can 
bound (18) above by 

Pi ae{Foe{sjM) - FQi{u)}du< dSi{u) 

yJrij Jrij 



14 P. GROENEBOOM, M. H. MAATHUIS AND J. A. WELLNER 



< P^nf^^|i/oKto)(siA/ - wf - jy dSdn)^ < . 

For mi sufficiently large, this probability is bounded above by pjm/^ for all 
M > mi and j £ N, by Lemma 2.11 below. Similarly, (19) is bounded by 
Pjm/'^, using Lemma 2.12 below. □ 

Lemmas 2.11 and 2.12 are the key lemmas in the proof of Proposition 2.10. 
They are the limit versions of Lemmas 4.13 and 4.14 of [8], and their proofs 
are given in Section 5. The basic idea of Lemma 2.11 is that the positive 
quadratic drift b{sjM — w)"^ dominates the Brownian motion process 5"^; and 
the term C {sjm — wY^"^ . Note that the lemma also holds when C {sjm — w)^^'^ 
is omitted, since this term is positive for M > 1. In fact, in the proof of 
Proposition 2.10 we only use the lemma without this term, but we need 
the term C{sjM — w)^^"^ in the proof of Proposition 2.9 ahead. The proof of 
Lemma 2.12 relies on the system of component processes. Since it is very 
similar to the proof of Lemma 4.14, we only point out the differences in 
Section 5. 

Lemma 2.11. Let C > and b > 0. Then there exists an mi > such 
that for all k = 1, . . . , K , M > mi and j £ N, 

P( inf \b{sjM - wf - dSk{u) - C{S,M - wf'A < O) < PjM, 

where sjm = j + Mv{j), and Sk{-), v{-) and pjM are defined by (10), (12) 
and (14), respectively. 

Lemma 2.12. Let i be defined by (16) and (17). There is an mi > 
such that 

P (^jy' {F+ {u) - Fo+ (n)} du < 0, AjM^ < pjM for M > mi , j € N, 

where Sju = j + Mv{j), T£j = max{A/f n {—oo,j + 1]}, and v{-), pjM and 
AjM are defined by (12), (14) and (15), respectively. 

In order to prove tightness of {Pfc(t) — -Fofe(*)}) i G we only need Propo- 
sition 2.10 to hold for one value of (3 £ (0, 1), analogously to [8], Remark 4.12. 
We therefore fix /3 = 1/2, so that v{t) = 1 V \/\t\. Then Proposition 2.10 leads 
to the following corollary, which is a limit version of Corollary 4.16 of [8]: 
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Corollary 2.13. For every e > there is a C > such that 

r j:_JF^{t)-Fo^{t)\dt 

-P< sup :rj7: > C > < £ JOr S G M. 

Ugr+ u V u-^'^ ) 

This corollary allows us to complete the proof of Proposition 2.9. 

Proof of Proposition 2.9. Let e > and let A; g {1, . . . ,_ftr}. It is suf- 
ficient to show that there is an M > such that P{Fk{t) > Fofc(* + M)) < e 
and P{Fk{t) < Fokit - M)) < e for ah t G M. We only prove the first in- 
equality, since the proof of the second one is analogous. Thus, let t G M and 
M > 1, and define 

BkM = {Fk{t) > Fok{t + M)} and = max{7Vl. n (-oo, t]}. 

Note that is well defined because of Theorem 1.7(iii) and Corollary 2.2(i). 
We want to prove that P{BkM) < Recall that F must satisfy (11). Hence, 

/ rt+M _ _ _ _ 

PiBkAi) = pU {ak{Fk{u) - Fokiu)} + aK+i{F+{u) - Fo+{u)}} du 
(20) 

rt+M 

< I dSk{u),BkM 
By Corollary 2.13, we can choose C > such that, with high probability, 
(21) / \F+{u)-Fo+{u)\du<C{t + M-nf/\ 

uniformly in < t, using that n'^/^ > u for u> 1. Moreover, on the event 
BkM: we have J^^""' {hiu) - Fofc(n)} du > J^^^' {Fok{t + M) - F^k{u)} du = 
fok{to){t + M — Tk)'^ /2, yielding a positive quadratic drift. The statement 
now follows by combining these facts with (20), and applying Lemma 2.11. 

□ 

Proposition 2.9 leads to the following corollary about the distance between 
the jump points of F^- The proof is analogous to the proof of Corollary 4.19 
of [8], and is therefore omitted. 

Corollary 2.14. For all k = 1, . . . ,K , let t^(s) and r^(s) be, respec- 
tively, the largest jump point < s and the smallest jump point > s of Fj.. 
Then for every e > there is a C > such that P{t^{s) — t^(s) > C) < e, 
for k = l,...,K, s£R. 
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Combining Theorem 2.9 and Corollary 2.14 yields tightness of {Hk{t) — 

Corollary 2.15. For every e > there is an M > such that 
P{\Hk{t)-Vk{t)\>M)<e forteR. 

2.3. Uniqueness of H and F . We now use the tightness results of Section 
2.2 to prove the uniqueness part of Theorem 1.7, as given in Proposition 2.16. 
The existence part of Theorem 1.7 will follow in Section 3. 

Proposition 2.16. Let H and H satisfy the conditions of Theorem 1.7. 
Then H = H almost surely. 

The proof of Proposition 2.16 relies on the following lemma: 

Lemma 2.17. Let H = (^i, . . . ,Hk) and H = {Hi, . . . ,Hk) satisfy the 
conditions of Theorem 1.7, and let F = {Fi, . . . , Fk) and F = {Fi, . . . , Fk) 
he the corresponding derivatives. Then 

j{Fk{t)-Fk{t)fdt + aK+i j {F+{t) - F+{t)Y dt 
k=i ■' 

(22) 

K 



m—*oo 

k=l 



where '(/'fc : M — > M is defined by 

(23) Mt) = {Fk{t) - Fk{t)}[ak{Hk{t) - Hk{t)} + aK+i{H+{t) - H+{t)}]. 
Proof. We define the following functional: 

^ r I'm rm > 

(F) = E «M W Fl{t)dt- Fk {t) dVk (t) 

k—i ^ J —m J —m ) 

{rm rm ^ 

\ Fl{t)dt- F+{t)dV+{t)\, 
J —m J —m J 

Then, letting 

(24) Dk{t) = ak{Hk{t) - Vk{t)} + aK+i{H+{t) - V+{t)}, 

(25) Dk{t) = ak{Hk{t) - Vk{t)} + aK+i{H+{t) - V+{t)}, 



m G N. 
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and using - = {F^ - Fkf + 2Fk{Fk - Ffc), we have 

4>n.{F) - (t>m{F) = E ^ r {Fk{t) - dt 

1. 1 J —rn 
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k=l 



(26) 



+ 



K 



{F+{t)-F+{t)}^dt 



/m 
{Fk{t)-Fk{t)}dDk{t). 
-m 



k=l 

Using integration by parts, we rewrite the last term of the right-hand side 
of (26) as: 

Y,{Fk{t) - Fk{t)}Dk{t) - E / Dk{t)d{Fk{t) - Fk{t)} 

(27) 

>J2{Fk{t) - Fk{t)}Dk{t) . 

k=l 

The inequahty on the last line Follows from: (a) J^^ Dk{t)dFk{t) = by 
Theorem 1.7(ii), and (b) J^^Dk{t) dFk{t) < 0, since Dk{t) < by Theorem 
1.7(i) and F^ is monotone nondecreasing. Combining (26) and (27), and 
using the same expressions with F and F interchanged, yields 

= (l)n,{F) - </.„(F) + (PmiF) - (t)m{F) 

>E«W {Fk{t)-Fk{t)Ydt + aK+i {F+{t) - F+{t)Y dt 

)^—\ J —m J —m 

K K 

+ Y.{^k{t)-Fi,{t)]Dk{t)'" +Y,{Fkit)-FkmDk{t)"' . 

^ — ^ —m '-^ —m 

k=l k=l 

By writing out the right-hand side of this expression, we find that it is 
equivalent to 

K 

J2 «fc / i^kit) - Fkit)}^ dt + aK+1 / {F+{t) - F+{t)}^ dt 



k=l 



K 



(28) 



< Y,[{Fkim) - Fkim)}{Dk{m) - Dkim)} 



k=l 



- {Fk{-m) - Fki-m)}{Dki-m) - Dki-m)}]. 

This inequality holds for all m G N, and hence we can take liminfm^oo- The 
left-hand side of (28) is a monotone sequence in m, so that we can replace 
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liminfm^oo by limm->oo- The result then follows from the definitions of tpk, 
Dk and Df, in (23)-(25). □ 

We are now ready to prove Proposition 2.16. The idea of the proof is to 
show that the right-hand side of (22) is almost surely equal to zero. We 
prove this in two steps. First, we show that it is of order Op(l), using the 
tightness results of Proposition 2.9 and Corollary 2.15. Next, we show that 
the right-hand side is almost surely equal to zero. 

Proof of Proposition 2.16. We first show that the right-hand side 
of (22) is of order Op{\). Let /c G {1, . . . and note that Proposition 2.9 
yields that {Fk{m) — Fofc(m)} and {Fk{m) — Fofe(m)} are of order Op(l), so 
that also {Fk{m) — Ffc(m)} = Op(l). Similarly, Corollary 2.15 implies that 
{Hk{m) — Hk{m)} = Op{l). Using the same argument for —m, this proves 
that the right-hand side of (22) is of order Op(l). 

We now show that the right-hand side of (22) is almost surely equal to 
zero. Let k £ {1, . . . , K}. We only consider |i<fc(m) — Ffc(m)| |i7fc(m) — fffc(m)|, 
since the term \Fk{m) — Fk{m)\\H^{m) — H^{m)\ and the point —m can be 
treated analogously. It is sufficient to show that 

(29) liminf P(|Ffc(m) - Ffc(m)| |i?fc(m) - Hkim)\ >r])=0 for ah r/ > 0. 

Let Tmk be the last jump point of F^ before m, and let Tmk be the last jump 
point of Fk before m. We define the following events: 

Em = Ernie, 5, C) = Elm (e) n E2m (<5) n ^3™ (C) where 
Eim = Eim{e) = \ r ^ {Fk{t)-Fk{t)fdt<e\, 

E2m = E2m{5) = {m - {Xmk V Tmk) > 5}, 

E3m = EsmiC) = {\Hk{m) - Hk{m)\ < C}. 

Let ei > and 62 > 0. Since the right-hand side of (22) is of order Op(l), 
it follows that / {Fk{t) - Fk{t)}^ dt = Op{l) for every A: G {1, . . . , /C}. This 
implies that Jm{Fk(t) — Ffc(t)}^ dt — >p as m — > 00. Together with the fact 
that m — {Tmk V Tmk} = Op(l) (Corollary 2.14), this implies that there is 
an mi > such that P{Eim{eiy) < £1 for all m > mi. Next, recall that 
the points of jump of F/^ and Ff^ are contained in the set Mk, defined in 
Proposition 2.1. Letting = maxjA/fc n (— oo,m]}, we have 

(30) P{E',^{d))<P{m-TU<6). 

The distribution of m — is independent of m, nondegenerate and con- 
tinuous (see [4]). Hence, we can choose S > such that the probabilities in 
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(30) are bounded by £2/2 for all m. Furthermore, by tightness of {Hk{m) — 
Hk{m)}, there is a C > such that P{E3miCY) < £2/2 for ah m. This 
implies that P{Em{£i,5, CY) < ei + £2 for m > mi. 
Returning to (29), we now have for r/ > 0: 

\immf P{\Fk{m)-Fk{m)\\Hk{m)-Hk{m)\>ri) 
m— >oo 

<ei +£2 

+ liminf P( I Fk (m) - F^ (?n) \\Hk{m) - Hk{m) \ > 7], Era{ei,6,C)) 

< 81+82+ limjnf p(^\Fk{m)-h{m)\>^,Em{8i, 6, C) 

using the definition of E^miC) in the last line. The probability in the last line 
equals zero for 81 small. To see this, note that Fk{m) — Fk{m) > rj/C, m — 
{Tmk V TVfifc} > 5, and the fact that Ff^ and are piecewise constant on 
m - {Tkm V ffcm} imply that 



roo rm 

/ ^ {Fk{u) - Fk{u)f du> ^ {Fk{u)-Fk{u)fdu> 



fj2 ' 



nk ^'^mk ^ '^mk ^'^mk 

so that Eim{8i) cannot hold for £1 < rfd/C"^ . 

This proves that the right-hand side of (22) equals zero, almost surely. 
Together with the right-continuity of Fk and Fk^ this implies that Fk = Fk 
almost surely, for k = 1, . . . , Since Fk and Fk are the right derivatives of 
Hk and Hk, this yields that Hk = Hk + Ck almost surely. Finally, both Hk 
and Hk satisfy conditions (i) and (ii) of Theorem 1.7 for k = 1, . . . ,K , so 
that ci = • • • = = and H = H almost surely. □ 



3. Proof of the limiting distribution of the MLE. In this section we prove 
that the MLE converges to the limiting distribution given in Theorem 1.8. 
In the process, we also prove the existence part of Theorem 1.7. 

First, we recall from [8], Section 2.2, that the naive estimators Fnk, 
k = 1, . . . , K , are unique at t G {Ti, . . . , T„}, and that the MLEs Fnk, k = 
1,. . . ,K, are unique at t G Tk, where 7^ = {Ti,i = 1, . . . ,n : AJ, -|- > 
0} U {T^n)} for k = 1, . . . , K (see [8], Proposition 2.3). To avoid issues with 

non- uniqueness, we adopt the convention that Fnk and Fnk, k = 1,...,K, 
are piecewise constant and right-continuous, with jumps only at the points 
at which they are uniquely defined. This convention does not affect the 
asymptotic properties of the estimators under the assumptions of Section 
1.2. Recalling the definitions of G and Gn given in Section 1.1, we now 
define the following localized processes: 
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Definition 3.1. For each k = 1, . . . , K, we define 

(31) F'^^%t) = n'/^E^kito + n-'/h) - Fofc(to)}, 

(32) vlt{t) = —- {5k-F^u{to)}dFr.{u,5), 

(33) FiT(i) = ^ / {F^kiu)-Fokito)}dG{u), 

I to 



gito) 



(34) Hi:at) = H':at) + ^ - F,,{to) e ^, 

ak ^1 afc 

where c„fc is the difference between akV^j^ + aK+iy^^ and a^H^^^ + aK+iH^!^!^ 
at the last jump point Tnk of -F^'^'^ before zero, that is, 

(35) Cnk = akV}g\Tnk-) + aK+iyi^^(r„fe-) - akH'^{:{Tnk) - aK+iH'°^{Tnk). 

Moreover, we define the vectors Fl^"" = (F^f , . . . , F^°^) , V;J°'= = (V^f , . . . , F^*^) 
and H^^" = {H^ni^ • • • > H^uk)- 

Note that differs from H^°^ by a vertical shift, and that (^i°fc )'(t) = 
(^^')f)'(t) = F^fc'(t) + o(l). We now show that the MLE satisfies the charac- 
terization given in Proposition 3.2, which can be viewed as a recentered and 
rescaled version of the characterization in Proposition 4.8 of [8]. In the proof 
of Theorem 1.8 we wih see that, as n ^ oo, this characterization converges 
to the characterization of the hmiting process given in Theorem 1.7. 

Proposition 3.2. Let the assumptions of Section 1.2 hold, and let m > 
0. Then 

akHnk{t) + ax+i-H'i+(i) 

< akV^Tit-) + aK+iV^''^{t-) + R'°m for t G [-m, m], 

/m 
{a,V^%t-) + aK+idV:^%t-) 
-m 

+ <°,^(t) - akH'jit) - aK+iHl°^m dF^tit) = 0, 
where R^°l:{t) = Op{l), uniformly int^ [— m,m]. 

Proof. Let m > and let Tnk be the last jump point of Fnk before to- 
It follows from the characterization of the MLE in Proposition 4.8 of [8] that 

{ak{Fnk{u) - FQk{u)} + aK+i{Fn+{u) - Fq+{u)}} dG{u) 
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(36) <f {ak{6k-Fok{u)}+aK+i{6+-Fo+iu)}}dFniu,6) 

+ RnkiTnk,s), 

where equality holds if s is a jump point of Using that to — Tnk = 
Op(n-^/3) by [8], Corollary 4.19, it follows from [8], Corollary 4.20 that 
Rnki^nk^s) = Op{n~'^/^), Uniformly in s e [to — min^/^,to + min~^/^]. We 
now add 

/' {ak{Fok{u) - Fofc(to)} + aK+i{Fo+{u) - Fo+(to)}} dG{u) 
to both sides of (36). This gives 

/ {ak{Fnk{u) - Fofc(to)} + aK+i{Fn+{u) - Fo+(io)}} dG{u) 

(37) <( {ak{5k-FQk{to)} + aK+i{6+-Fo+{tQ)}}d¥n{u,5) 

+ ^nki'^nkiS), 

where equality holds if s is a jump point of Fnk, and where 

K.fc(^'0 = Rnkis,t) + Pnk{s,t), 

with 



Pnk{s,t) = / {ak{Fok{to) - Fokiu)} 

Jls,t) 

+ aK+i{Fo+{to) - Fo+{u)}} d{Gn - G){u). 

Note that Pnk{Tnk,s) = Op{n~'^^^), uniformly in s G [to — min~^/'^,to + min~^/^], 
using (29) in [8], Lemma 4.9 and to - ^nk = Op{n~^/^) by [8], Corollary 4.19. 
Hence, the remainder term i?^^ in (37) is of the same order as Rnk- Next, 
consider (37), and write /[^^^ ,) = /[r„fe,to) +/[to,^)' s = to + n-^/H, and 
multiply by n"^^^ / g{tQ) . This yields 

(38) cnk + akHnkit) + aK+iHn+it) < R^°^{t) + akV^l'^it-) + aK+iV^"^{t-), 
where equality holds if t is a jump point of F^^ and where 

(39) R':^{t) = {n^/'/g{to)}R'^,{Tnk,to + n~'/h), k = l,...,K. 

Note that R^kit) = Op(l) uniformly in t e [—mi, mi], using again that to — 
Tnk = Op{n~^^^). Moreover, note that is left-continuous. We now remove 
the random variables Cnk by solving the following system of equations for 
Hi, . . . ,Hk- 

Cnk + akHnk{t) + aK+lHn+{t) = akHnk{t) + aK+iHn+{t), k = 1, . . . ,K. 
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The unique solution is Hnk{t) = Hnk{t) + (cnk/ak) +J2k=i{cnk/ak) = H^^ki*)- 
□ 

Definition 3.3. We define C/„ = where R^°^ = 

{E}^f,...,R^^f^) with R^^^ defined by (39), and where and 
are given in Definition 34. We use the notation •![— m,m] to denote that 
processes are restricted to [— m,m]. 

We now define a space for ?7„|[— m,m]: 

Definition 3.4. For any interval /, let D~{I) be the collection of 
"caglad" functions on / (left-continuous with right limits), and let C{I) 
denote the collection of continuous functions on /. For m G N, we define the 
space 

E[—m,m] = {D~[—m,m])^ x {D[—m,m])^ x (C[— m,m])^ x {D[—m,m])^ 
= 1x11 X III X IV, 

endowed with the product topology induced by the uniform topology on 
I X II X III, and the Skorohod topology on IV . 

Proof of Theorem 1.8. Analogously to the work of [6], proof of Theo- 
rem 6.2, on the estimation of convex densities, we first show that C/„| [— m, m] 
is tight in m,m] for each m G N. Since m,m] = Op(l) by Propo- 

sition 3.2, it follows that R^°'^ is tight in {D~[—m,m])^ endowed with the 
uniform topology. Next, note that the subset of D[—m,m] consisting of 
absolutely bounded nondecreasing functions is compact in the Skorohod 
topology. Hence, the local rate of convergence of the MLE (see [8], The- 
orem 4.17) and the monotonicity of Fj^^, k = 1,...,K, yield tightness of 
m, m] in the space {D[—m,m])^ endowed with the Skorohod topol- 
ogy. Moreover, since the set of absolutely bounded continuous functions 
with absolutely bounded derivatives is compact in C\—m, rri\ endowed with 
the uniform topology, it follows that H^^'^\[—m,m] is tight in {C[—m,m\)^ 
endowed with the uniform topology. Furthermore, m,m] is tight in 

(£)[— m,m])^ endowed with the uniform topology, since V^'^{t) — >d T^(t) 
uniformly on compacta. Finally, Cni , ■ ■ ■ , CnK are tight since each Cnk is the 
difference of quantities that are tight, using that to — Tnk = Op{n~^^^) by [8], 
Corollary 4.19. Hence, also [— m, m] is tight in (C[— m,m])^ endowed 
with the uniform topology. Combining everything, it follows that C/„| [— m, m] 
is tight in E[—m,m] for each m G N. 

It now follows by a diagonal argument that any subsequence [/„/ of Un 
has a further subsequence ?7„" that converges in distribution to a limit 
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U = {0,V,H,F) £ (C(M))^' X x (C(M))-^^ x (D(M))^. 

Using a representation theorem (see, e.g., [2], [15], Representation Theo- 
rem 13, page 71, or [17], Theorem 1.10.4, page 59), we can assume that 
Un" — >a.s. U. Hence, F = H' at continuity points of F, since the derivatives 
of a sequence of convex functions converge together with the convex func- 
tions at points where the hmit has a continuous derivative. Proposition 3.2 
and the continuous mapping theorem imply that the vector (V, H, F) must 
satisfy 

inf {akVk{t) + aK+iV+{t) - a^Hkit) - aK+iH+{t)} > 0, 

[— m,m] 

/m 
{akVkit) + aK+iV+{t) - akHkit) - aK+iH+{t)} dFk{t) = 0, 
-m 

for ah m G N, where we replaced 14 (t—) by Vfc(t), since Vi, . . . , Vk are con- 
tinuous. 

Letting m — > oo, it fohows that Hi, . . . , Hk satisfy conditions (i) and (ii) 
of Theorem 1.7. Furthermore, Theorem 1.7(iii) is satisfied since to ~ Tnk = 
Op{n~^/^) by [8], Corollary 4.19. Hence, there exists a /C-tuple of processes 
{Hi, . . . ,Hk) that satisfies the conditions of Theorem 1.7. This proves the 
existence part of Theorem 1.7. Moreover, Proposition 2.16 implies that there 
is only one such ii'-tuple. Thus, each subsequence converges to the same limit 
H = {Hi, . . . , Hk) = {Hi, . . . ,Hk) defined in Theorem 1.8. In particular, 
this implies that F}^''{t) = n^/^{Fn{tQ + n-^lH) - Fo(to)) -^d F{t) in the 
Skorohod topology on (^(M))^. □ 

4. Simulations. We simulated 1000 data sets of sizes n = 250, 2500 and 
25,000, from the model given in Example 2.3. For each data set, we computed 
the MLE and the naive estimator. For computation of the naive estimator, 
see [1], pages 13-15 and [9], pages 40-41. Various algorithms for the com- 
putation of the MLE are proposed by [10, 11, 12]. However, in order to 
handle large data sets, we use a different approach. We view the problem as 
a bivariate censored data problem, and use a method based on sequential 
quadratic programming and the support reduction algorithm of [7]. Details 
are discussed in [13], Chapter 5. As convergence criterion we used satisfac- 
tion of the characterization in [8], Corollary 2.8, within a tolerance of 10"^'^. 
Both estimators were assumed to be piecewise constant, as discussed in the 
beginning of Section 3. 

It was suggested by [12] that the naive estimator can be improved by 
suitably modifying it when the sum of its components exceeds 1. In order 
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to investigate this idea, we define a "scaled naive estimator" by 



Fnk{t), ^ ifF„+(so)<l, 

Fnk{t)/Fn+{so), if Fn+{so) > 1, 



for k = I, . . . , K , where we take sq = 3. Note that F^_^_{t) < 1 for t < 3. 
We also defined a "truncated naive estimator" F^^. If F„_|_(T(,„)) < 1, then 
F*^ = Fnk for all k = 1, . . . , K. Otherwise, we let Sn = min{i : Fn+{t) > 1} 



and define 



where 



FLit) 



Fnk{t), fort<Sn, 
Fnk{t)+Olnk, fort>S„, 



Otnk = ~ , , ~— r|l - Fn+{Sn-)}, 

Fn+{Sn) - Fn+{Sn-) 

fov k = l,...,K. Note that F*+(t) < 1 for all t G M. 

We computed the mean squared error (MSE) of all estimators on a grid 
with points 0,0.01,0.02, . . . ,3.0. Subsequently, we computed relative MSEs 
by dividing the MSE of the MLE by the MSE of each estimator. The results 
are shown in Figure 3. Note that the MLE tends to have the best MSE, for 
all sample sizes and for all values of t. Only for sample size 250 and small 
values of t, the scaled naive estimator outperforms the other estimators; 
this anomaly is caused by the fact that this estimator is scaled down so 
much that it has a very small variance. The difference between the MLE 
and the naive estimators is most pronounced for large values of t. This 
was also observed by [12], and they explained this by noting that only the 
MLE is guaranteed to satisfy the constraint F^{t) < 1 at large values of t. 
We believe that this constraint is indeed important for small sample sizes, 
but the theory developed in this paper indicates that it does not play any 
role asymptotically. Asymptotically, the difference can be explained by the 
extra term (ax+i/afc){^+ ~ H^} in the limiting process of the MLE (see 
Proposition 2.4), since the factor ax+i/ak = F^kit) / FQ^K+i{t) is increasing 
in t. 

Among the naive estimators, the truncated naive estimator behaves better 
than the naive estimator for sample sizes 250 and 2500, especially for large 
values of t. However, for sample size 25,000 we can barely distinguish the 
three naive estimators. The latter can be explained by the fact that all 
versions of the naive estimator are asymptotically equivalent for t G [0,3], 
since consistency of the naive estimator ensures that lim„^oo -^n+(3) < 1 
almost surely. On the other hand, the three naive estimators are clearly less 
efficient than the MLE for sample size 25,000. These results support our 
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( I 

Fig. 3. Relative MSEs, computed by dividing the MSE of the MLE by the MSB of the 
other estimators. All MSEs were computed over 1000 simulations for each sample size, on 
the grid 0, 0.01, 0.02, 3.0. 



theoretical finding that the form of the hkehhood (and not the constrained 
< 1) causes the different asymptotic behavior of the MLE and the naive 
estimator. 

Finally, we note that our simulations consider estimation of -Pbfc(^)) ^or 
t on a grid. Alternatively, one can consider estimation of certain smooth 
functionals of -Fofc- The naive estimator was suggested to be asymptotically 
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efficient for this purpose [12], and [14], Chapter 7, proved that the same is 
true for the MLE. A simulation study that compares the estimators in this 
setting is presented in [14], Chapter 8.2. 

5. Technical proofs. 

Proof Lemma 2.11. Let k G {l,...,K} and j G N = {0,1,.. .}. Note 
that for M large, we have for all w < j + 1: 

C{SjM -WV {SjM - wf/"^) < \b{SjM - wf- 

Hence, the probability in the statement of Lemma 2.11 is bounded above by 




In turn, this probability is bounded above by 

(40) Y.P\ s^iP / dSk{u)>\kjA. 

where X^jq = h{sjM -{j-q + 1))V2 = b{Mv{j) +q- if /2. 
We write the qth term in (40) as 

P{ sup Sk{SjM -W)> Xkjq j 

\we[j~q,j-q+l) / 

< ^'f sup Sk{w) > Xkjq) =P( sup Sk{w) > ) 

\w€lOMv(j)+q) / Vu,e[0,l) ^/Mv{J)+q/ 

<p(sup Bkiw)> ) 

<2P( N{0, 1) > , ^^'^ \ < 2bkjgexp(--( -M^=y), 

where bk is the standard deviation of 5^(1) and bkjq = hk\J Mv{j) + qj {Xkjq x 
\/27r), and Bk{-) is standard Brownian motion. Here we used standard prop- 
erties of Brownian motion. The second to last inequality is given in, for ex- 
ample, [16], (6), page 33, and the last inequality follows from Mills' ratio 
(see [3], (10)). Note that bkjq < d ah j e N, for some d > and ah M > 3. It 
follows that (40) is bounded above by 

,t-o V Ahk^JMv(j)+qJ ) ^„ ' \ 2 6| 
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which in turn is bounded above by di ex.p{—d2{Mv{j))'^) , for some constants 
di and d2, using (a + 6)'^ >a^ + b^ for a,b>0. □ 

Proof of Lemma 2.12. This proof is completely analogous to the proof 
of Lemma 4.14 of [8], upon replacing Fnkiu) by Fk{u), Fokiu) by Fok{u), 
dG{u) by du, Snk{-) by Sk{-), Tnkj by Tkj, SnjM by sjm, and AnjM by 
AjM- The only difference is that the second term on the right-hand side 
of equation (69) in [8], vanishes, since this term comes from the remainder 
term i?„fc(s,t), and we do not have such a remainder term in the limiting 
characterization given in Proposition 3.2. □ 
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